Groundtruth Budgeting : A Novel Approach to Semi - Supervised Relation Extraction in Medical Language

نویسنده

  • Ozlem Uzuner
چکیده

We address the problem of weakly-supervised relation extraction in hospital discharge summaries. Sentences with pre-identified concept types (for example: medication, test, problem, symptom) are labeled with the relationship between the concepts. We present a novel technique for weakly-supervised bootstrapping of a classifier for this task: Groundtruth Budgeting. In the case of highly-overlapping, self-similar datasets as is the case with the 2010 i2b2/VA challenge corpus, the performance of classifiers on the minority classes is often poor. To address this we set aside a random portion of the groundtruth at the beginning of bootstrapping which will be gradually added as the classifier is bootstrapped. The classifier chooses groundtruth samples to be added by measuring the confidence of its predictions on them and choosing samples for which it has the least confident predictions. By adding samples in this fashion, the classifier is able to increase its coverage of the decision space while not adding too many majority-class examples. We evaluate this approach on the 2010 i2b2/VA challenge corpus containing of 477 patient discharge summaries and show that with a training corpus of 349 discharge summaries, budgeting 10% of the corpus achieves equivalent results to a bootstrapping classifier starting with the entire corpus. We compare our results to those of other papers published in the proceedings of the 2010 Fourth i2b2/VA Shared-Task and Workshop. Thesis Co-Supervisor: Ozlem Uzuner Title: Assistant Professor of Information Studies SUNY, Albany Thesis Co-Supervisor: Peter Szolovits Title: Professor of Computer Science and Engineering MIT CSAIL

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Using Graphs of Classifiers to Impose Declarative Constraints on Semi-supervised Learning

We propose a general approach to modeling semisupervised learning (SSL) algorithms. Specifically, we present a declarative language for modeling both traditional supervised classification tasks and many SSL heuristics, including both well-known heuristics such as co-training and novel domainspecific heuristics. In addition to representing individual SSL heuristics, we show that multiple heurist...

متن کامل

Improving Semi-Supervised Acquisition Of Relation Extraction Patterns

This paper presents a novel approach to the semi-supervised learning of Information Extraction patterns. The method makes use of more complex patterns than previous approaches and determines their similarity using a measure inspired by recent work using kernel methods (Culotta and Sorensen, 2004). Experiments show that the proposed similarity measure outperforms a previously reported measure ba...

متن کامل

Semi-Supervised Convolution Graph Kernels for Relation Extraction

Extracting semantic relations between entities is an important step towards automatic text understanding. In this paper, we propose a novel Semi-supervised Convolution Graph Kernel (SCGK) method for semantic Relation Extraction (RE) from natural English text. By encoding sentences as dependency graphs of words, SCGK computes kernels (similarities) between sentences using a convolution strategy,...

متن کامل

A Review of Relation Extraction

Many applications in information extraction, natural language understanding, information retrieval require an understanding of the semantic relations between entities. We present a comprehensive review of various aspects of the entity relation extraction task. Some of the most important supervised and semi-supervised classification approaches to the relation extraction task are covered in suffi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011